Search CORE

1,678 research outputs found

Performance comparison between Java and JNI for optimal implementation of computational micro-kernels

Author: Charles Henri-Pierre
Halli Nassim A.
Mehaut Jean-François
Publication venue
Publication date: 21/12/2014
Field of study

General purpose CPUs used in high performance computing (HPC) support a vector instruction set and an out-of-order engine dedicated to increase the instruction level parallelism. Hence, related optimizations are currently critical to improve the performance of applications requiring numerical computation. Moreover, the use of a Java run-time environment such as the HotSpot Java Virtual Machine (JVM) in high performance computing is a promising alternative. It benefits from its programming flexibility, productivity and the performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT compiler suffers from two main drawbacks. First, the JIT is a black box for developers. We have no control over the generated code nor any feedback from its optimization phases like vectorization. Secondly, the time constraint narrows down the degree of optimization compared to static compilers like GCC or LLVM. So, it is compelling to use statically compiled code since it benefits from additional optimization reducing performance bottlenecks. Java enables to call native code from dynamic libraries through the Java Native Interface (JNI). Nevertheless, JNI methods are not inlined and require an additional cost to be invoked compared to Java ones. Therefore, to benefit from better static optimization, this call overhead must be leveraged by the amount of computation performed at each JNI invocation. In this paper we tackle this problem and we propose to do this analysis for a set of micro-kernels. Our goal is to select the most efficient implementation considering the amount of computation defined by the calling context. We also investigate the impact on performance of several different optimization schemes which are vectorization, out-of-order optimization, data alignment, method inlining and the use of native memory for JNI methods.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-CEA

Is dynamic compilation possible for embedded systems ?

Author: Charles Henri-Pierre
Lomüller Victor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2015
Field of study

International audienceJIT compilation and dynamic compilation are powerful techniques allowing to delay the final code generation to the run-time. There is many benefits : improved portability, virtual machine security, etc. Unforturnately the tools used for JIT compilation and dynamic compilation does not met the classical requirement for embedded platforms: memory size is huge and code generation has big overheads. In this paper we show how dynamic code specialization (JIT) can be used and be beneficial in terms of execution speed and energy consumption with memory footprint kept under control. We based our approaches on our tool de-Goal and on LLVM, that we extended to be able to produce lightweight runtime specializers from annotated LLVM programs. Benchmarks are manipulated and transformed into templates and a specialization routine is build to instantiate the routines. Such approach allows to produce efficient special-izations routines, with a minimal energy consumption and memory footprint compare to a generic JIT application. Through some benchmarks, we present its efficiency in terms of speed, energy and memory footprint. We show that over static compilation we can achieve a speed-up of 21 % in terms of execution speed but also a 10 % energy reduction with a moderate memory footprint

HAL-CEA

Self-optimisation using runtime code generation for wireless sensor networks

Author: Charles Henri-Pierre
Couroussé Damien
Quéva Caroline
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/01/2016
Field of study

International audienceThis paper addresses the use of runtime code specialisation in resource-constrained embedded systems such as nodes of a Wireless Sensor Network (WSN), in order to improve software efficiency, hence the lifetime of WSN nodes. In our approach, runtime code specialisation is achieved with in-place runtime code generation. We present a self-optimising system using runtime code generation. Our system is able to automatically make the decision to generate specialised code and use it each time an improvement is observed in application performance. In the Internet of Things (IoT), devices usually have limited precision; our system adapts to theses devices decreasing precision in order to increase performance. We evaluate our system on floating point multiplication using the WisMote platform, where the specialised code executes more than 7 times faster than generic code, all overheads included. To the best of our knowledge, it is the first time that a runtime code generation system is used to automatically optimise code in such constrained devices as WSN nodes

Hal - Université Grenoble Alpes

HAL-CEA

Involvement of small-scale dairy farms in an industrial supply chain: When production standards meet farm diversity

Author: Bernard Jennifer
Hostiou Nathalie
Le Gal Pierre-Yves
Moulin Charles-Henri
Triomphe Bernard
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2011
Field of study

In certain contexts, dairy firms are supplied by small-scale family farms. Firms provide a set of technical and economic recommendations meant to help farmers meet their requirements in terms of the quantity and quality of milk collected. This study analyzes how such recommendations may be adopted by studying six farms in Brazil. All farms are beneficiaries of the country's agrarian reforms, but they differ in terms of how they developed their activities, their resources and their milk collection objectives. First, we built a technical and economic benchmark farm based on recommendations from a dairy firm and farmer advisory institutions. Our analysis of the farms' practices and technical and economic results show that none of the farms in the sample apply all of the benchmark recommendations; however, all farms specialized in dairy production observe the main underlying principles with regard to feeding systems and breeding. The decisive factors in whether the benchmark is adopted and successfully implemented are (i) access to the supply chain when a farmer establishes his activity, (ii) a grasp of reproduction and forage production techniques and (iii) an understanding of dairy cattle feed dietary rationing principles. The technical problems observed in some cases impact the farms' dairy performance and cash position; this can lead to a process of disinvestment. This dynamic of farms facing production standards suggests that the diversity of specialized livestock farmers should be taken into account more effectively through advisory approaches that combine basic zootechnical training with assistance in planning farm activities over the short and medium term. (Résumé d'auteur

HAL Clermont Université

HAL Descartes

Agritrop

HAL-CIRAD

Compilation for heterogeneous SoCs : bridging the gap between software and target-specific mechanisms

Author: Charles Henri-Pierre
Dardaillon Mickaël
Marquet Kevin
Martin Jerome
Risset Tanguy
Publication venue: HAL CCSD
Publication date: 22/01/2014
Field of study

International audienceCurrent applications constraints are pushing for higher computation power while reducing energy consumption, driving the development of increasingly specialized socs. In the mean time, these socs are still programmed in assembly language to make use of their specific hardware mechanisms. The constraints on hardware development bringing specialization, hence heterogeneity, it is essential to support these new mechanisms using high-level programming. In this work, we use a parametric data flow formalism to abstract the application from any hardware platform. From this premise, we propose to contribute to the compilation of target independent programs on heterogeneous platforms. These developments are threefold, with 1) the support of hardware accelerators for computation using actor fusion, 2) the automatic generation of communications on complex memory layouts and 3) the synchronization of distributed cores using hardware mechanisms for scheduling. The code generation is illustrated on a telecommunication dedicated heterogeneous soc

INRIA a CCSD electronic archive server

HAL-CEA

Contrôle d'application flot de données pour les systèmes sur puces : étude de cas sur la plateforme Magali

Author: Charles Henri-Pierre
Dardaillon Mickaël
Marquet Kevin
Martin Jérôme
Risset Tanguy
Publication venue: HAL CCSD
Publication date: 23/04/2014
Field of study

International audienceLes applications embarquées demandent toujours plus de puissance de calcul pour moins de consommation, avec comme conséquence l'apparition de systèmes sur puces dédiés. Dans le domaine du traitement du signal, le modèle de calcul flot de données est couramment utilisé pour la programmation de ces systèmes sur puce. Il est donc nécessaire d'avoir un modèle d'exécution adapté à ces architectures et répondant aux contraintes applicatives. Dans ce tra- vail, nous proposons un nouveau modèle d'exécution pour le contrôle d'applications flot de données. Notre approche s'appuie sur les liens entre les caractéristiques des applications et les performances selon le modèle d'exécution associé. Ce travail est illustré avec une étude de cas sur la plateforme Magali

INRIA a CCSD electronic archive server

HAL-CEA

Cognitive Radio Programming: Existing Solutions and Open Issues

Author: Charles Henri-Pierre
Dardaillon Mickaël
Marquet Kevin
Martin Jérôme
Risset Tanguy
Publication venue: HAL CCSD
Publication date: 08/09/2013
Field of study

Software defined radio (sdr) technology has evolved rapidly and is now reaching market maturity, providing solutions for cognitive radio applications. Still, a lot of issues have yet to be studied. In this paper, we highlight the constraints imposed by recent radio protocols and we present current architectures and solutions for programming sdr. We also list the challenges to overcome in order to reach mastery of future cognitive radios systems.La radio logicielle a évolué rapidement pour atteindre la maturité nécessaire pour être mise sur le marché, offrant de nouvelles solutions pour les applications de radio cognitive. Cependant, beaucoup de problèmes restent à étudier. Dans ce papier, nous présentons les contraintes imposées par les nouveaux protocoles radios, les architectures matérielles existantes ainsi que les solutions pour les programmer. De plus, nous listons les difficultés à surmonter pour maitriser les futurs systèmes de radio cognitive

INRIA a CCSD electronic archive server

HAL-CEA

deGoal a tool to embed dynamic code generators into applications

Author: Charles Henri-Pierre
Couroussé Damien
Endo Fernando A.
Gauguey Rémy
Lomüller Victor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceThe processing applications that are now being used in mo- bile and embedded platforms require at the same time a fair amount of processing power and a high level of flexibility, due to the nature of the data to process. In this context we propose a lightweight code genera- tion technique that is able to perform data dependent optimizations at run-time for processing kernels. In this paper we present the motivations and how to use deGoal a tool designed to build fast and portable binary code generators called com- pilettes

Crossref

HAL-CEA

Code Generation for an Application-Specific VLIW Processor With Clustered, Addressable Register Files

Author: Bernard Christian
Charles Henri-Pierre
Cohen Albert
Fabre Christian
Llopard Ivan
Martin Jérôme
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/02/2013
Field of study

International audienceModern compilers integrate recent advances in compiler construction, intermediate representations, algorithms and programming language front-ends. Yet code generation for appli\-cation-specific architectures benefits only marginally from this trend, as most of the effort is oriented towards popular general-purpose architectures. Historically, non-orthogonal architectures have relied on custom compiler technologies, some retargettable, but largely decoupled from the evolution of mainstream tool flows. Very Long Instruction Word (VLIW) architectures have introduced a variety of interesting problems such as clusterization, packetization or bundling, instruction scheduling for exposed pipelines, long delay slots, software pipelining, etc. These have been addressed in the literature, with a focus on the exploitation of Instruction Level Parallelism (ILP). While these are well known solutions already embedded into existing compilers, they rely on common hardware functionalities that are expected to be present in a fairly large subset of VLIW architectures. This paper presents our work on back-end compiler for Mephisto, a high performance low-power application-specific processor, based on LLVM. Mephisto is specialized enough to challenge established code generation solutions for VLIW and DSP processors, calling for an innovative compilation flow. Conversely, even though Mephisto might be seen a somewhat exotic processor, its hardware characteristics such as addressable register files benefit from existing analyses and transformations in LLVM. We describe our model of the Mephisto architecture, the difficulties we encountered, and the associated compilation methods, some of them new and specific to Mephisto

INRIA a CCSD electronic archive server

HAL-CEA